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ABSTRACT 

The  work  primarily  focused  on  two  lines  of  research. 

1.  We  propose  new  greedy  algorithms  for  learning  the  structure  of  a  graphical  model  of  a  probability  distribution,  given  samples  drawn  from 
the  distribution.  Our  research  modifies  greedy  algorithms  through  appropriate  node  pruning,  to  result  in  fast  algorithms  that  provide 
analytical  guarantees  on  correctness. 

2.  The  objective  of  this  line  of  work  is  to  use  noisy  measurements  from  cascades  -  stochastic  processes  for  spread  on  graphs  -  to  learn  the 
spread  of  information  /  opinion  /  malware.  Our  approach  for  this  learning  problem  has  been  to  view  this  as  hypothesis  testing  on  graphs  - 
given  noisy  and  partial  information  on  both  node  states  and  the  network  graph,  we  formulate  the  problem  as  distinguishing  between  a  benign 
hypothesis  (no  spreading  process)  and  a  malicious  hypothesis  (spreading  process  such  as  malware).  This  approach  has  been  used  in  a 
sequence  of  studies,  starting  from  distinguishing  with  partial  information,  to  that  with  nodes  with  are  adversarial  (nodes  could  lie  about  their 
state),  to  dealing  with  noisy  network  knowledge.  We  have  also  been  able  to  use  this  approach  to  learn  the  identity  of  communities  with 
shared  interests. 
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Technology  Transfer 


Greedy  Learning  of  Graphical  Models 


We  propose  new  greedy  algorithms  for  learning  the  structure  of  a  graphical  model  of  a 
probability  distribution,  given  samples  drawn  from  the  distribution.  While  structure  learning  of 
graphical  models  is  a  widely  studied  problem  with  several  existing  methods,  greedy  approaches 
remain  attractive  due  to  their  low  computational  cost. 

The  most  natural  greedy  algorithm  would  be  one  which,  essentially,  adds  neighbors  to  a  node  in 
sequence  until  stopping;  it  would  do  this  for  each  node.  While  it  is  fast,  simple  and  parallel,  this 
naive  greedy  algorithm  has  the  tendency  to  add  non-neighbors  that  show  high  correlations  with 
the  given  node.  Our  new  algorithms  overcome  this  problem  in  three  different  ways.  The 
recursive  greedy  algorithm  iteratively  recovers  the  neighbors  by  running  the  greedy  algorithm  in 
an  inner  loop,  but  each  time  only  adding  the  last  added  node  to  the  neighborhood  set.  The  second 
fom’ard-backward  greedy  algorithm  includes  a  node  deletion  step  in  each  iteration  that  allows 
non-neighbors  to  be  removed  from  the  neighborhood  set  which  may  have  been  added  in  previous 
steps.  Finally,  the  greedy  algorithm  with  pruning  runs  the  greedy  algorithm  until  completion  and 
then  removes  all  the  incorrect  neighbors.  We  provide  both  analytical  guarantees  and  empirical 
performance  for  our  algorithms.  We  show  that  in  graphical  models  with  strong  non-neighbor 
interactions,  our  greedy  algorithms  can  correctly  recover  the  graph,  whereas  the  previous  greedy 
and  convex  optimization  based  algorithms  do  not  succeed. 

A.  Ray,  S.  Sanghavi  and  S.  Shakkottai,  “Improved  Greedy  Algorithms  for  Learning  Graphical 
Models”.  IEEE  Transactions  on  Information  Theory,  Volume  61,  Number  6,  pp.  3457  -  3468, 
June  2015. 

Epidemic  Spread  and  Detection 

Objective:  The  objective  of  this  project  is  to  use  noisy  measurements  from  cascades  -  stochastic 
processes  for  spread  on  graphs  -  to  infer  the  spread  of  information  /  opinion  /  malware. 

Approach:  Our  approach  has  been  to  treat  the  problem  as  hypothesis  testing  on  graphs  -  given 
noisy  and  partial  information  on  both  node  states  and  the  network  graph,  we  formulate  the 
problem  as  distinguishing  between  a  benign  hypothesis  (no  spreading  process)  and  a  malicious 
hypothesis  (spreading  process  such  as  malware). 

Scientific  Barriers:  The  high  dimensionality  of  the  problem,  along  with  noisy,  partial  and 
potentially  adversarial  node  information  renders  the  classical  maximum  likelihood  approach  to 
be  intractable.  We  have  instead  developed  novel  methods  based  on  concentration  of  measures  on 
graphs  to  develop  simple  algorithms  (low  degree  polynomial  with  respect  to  graph  size)  to 
distinguish  between  the  hypotheses,  and  with  theoretical  guarantees  on  performance  (in  the 
regime  where  the  network  size  is  large). 

Significance:  This  binary  hypothesis  view  has  been  used  in  a  sequence  of  works,  starting  from 
distinguishing  with  partial  information,  to  that  with  nodes  with  are  adversarial  (nodes  could  lie 
about  their  state),  to  dealing  with  noisy  network  knowledge.  We  have  also  been  able  to  use  this 
approach  to  learn  the  identity  of  communities  with  shared  interests  (topic  modeling  with  network 


state).  Finally,  we  have  used  these  theoretical  ideas  to  develop  a  practical  platform  to  detect  the 
presence  of  malware  on  smartphone  platforms. 

As  part  of  this  broader  study,  we  have  also  developed  new  understanding  of  epidemic  spread 
with  a  variety  of  assumptions  (e.g.  what  if  the  spread  has  bounded  susceptibility,  and  there  are 
adversarial  nodes  that  assist  the  spread). 

Accomplishments:  A  number  of  papers  in  premier  conferences  and  journals  in  the  field  (listed 
below).  Further,  a  student  working  on  this  project  (Dr.  Siddhartha  Banerjee)  graduated  with  his 
Ph.D.  and  has  joined  the  Operations  Research  and  Information  Engineering  (ORIE)  Department 
at  Cornell  University  as  a  tenure-track  Assistant  Professor. 

“Localized  epidemic  detection  in  networks  with  overwhelming  noise”,  E.  Meirom,  C. 
Caramanis,  S.  Mannor,  S.  Shakkottai,  A.  Orda,  Proceedings  of  ACM  Sigmetrics  (poster  paper), 
Portland,  OR,  June  2015. 

“Distinguishing  Infections  on  Different  Graph  Topologies”,  C.  Milling,  C.  Caramanis,  S. 
Mannor  and  S.  Shakkottai.  IEEE  Transactions  on  Information  Theory ,  Vol.  61,  No.  6,  June  2015. 

“Local  Detection  of  Infections  in  Heterogeneous  Networks”,  C.  Milling,  C.  Caramanis,  S. 
Mannor  and  S.  Shakkottai,  Proc.  of  IEEE  Infocom,  Hong  Kong,  2015. 

S.  Banerjee,  A.  Gopalan,  A.  Das  and  S.  Shakkottai,  “Epidemic  Spreading  with  External  Agents”. 
IEEE  Transactions  on  Information  Theory,  Volume  60,  Issue  7,  pp.  4125  -  4138,  July  2014. 

S.  Banerjee,  A.  Chatterjee  and  S.  Shakkottai,  “Epidemic  Thresholds  with  External  Agents”. 
Proceedings  of  IEEE  Infocom ,  Toronto,  Canada,  April  2014.  (19%  acceptance) 

S.  Krishnasamy,  S.  Banerjee  and  S.  Shakkottai,  “The  Behavior  of  Epidemics  under  Bounded 
Suceptability”.  Proceedings  of  ACM  Sigmetrics,  Austin,  TX  June  2014.  (17%  acceptance) 

“Topic  Modeling  from  Network  Spread,”  A.  Ray,  S.  Sanghavi  and  S.  Shakkottai.  Proceedings  of 
ACM  Sigmetrics  (poster  paper),  Austin,  TX  June  2014. 

Conclusions:  Our  approach  based  on  hypothesis  testing  on  graphs  provides  a  new  tool  for 
detecting  malware  (or  more  generally,  spreading  processes).  This  has  a  variety  of  applications, 
ranging  from  detecting  communities  with  shared  interests  in  networks,  to  detecting  new 
malware.  We  have  explored  various  applications  using  real  datasets,  and  shown  the  benefits  of 
our  approach. 


Image: 


IG1 

IG  2 

IG  3 

IG  4 

IG  S 

Website  names 

1 

0 

0 

0 

0 

0 

bbc.co.uk 

| 

Interest  Group  1  Tj 

2 

A 

0 

0 

0 

en.wikipedia.org 

3 

0 

0 

1 

0 

0 

freerepublic.com 

4 

0 

0 

0 

1 

0 

ibtimes.com 

1 

1 

5 

0 

0 

0 

0 

1 

examiner.com 

6 

0 

0 

1 

1 

1 

nypost.com 

7 

0 

0 

0 

1 

1 

entertainment.msn.com 

s 

m 

0 

1 

1 

1 

thetelegraph.com 

1 

Interest  Group  4  j 

9 

0 

0 

0 

0 

dailypost.co.uk 

10 

m 

0 

0 

0 

0 

sundaysun.co.uk 

[ 

11 

0 

0 

1 

1 

post-gazette.com 

12 

0 

0 

1 

1 

1 

ca.biz.yahoo.com 

13 

0 

0 

1 

1 

1 

courier-journal.com 

14 

0 

0 

1 

1 

1 

nbc26.com 

15 

0 

0 

0 

1 

1 

freep.com 

16 

0 

0 

0 

1 

1 

nydailynews.com 

17 

0 

0 

1 

1 

1 

rocketnews.com 

18 

0 

0 

0 

0 

liverpooldailypost.co.uk 

19 

a 

0 

1 

1 

1 

washingtontimes.com 

20 

0 

0 

0 

0 

— 

sleafordstandard.co.uk 

Topic  modeling  from  network  spread  -  new  algorithms  for  determining  topics  for  each  website 
using  network  spread  models.  The  dataset  for  this  study  is  from  the  Stanford  Network  Analysis 
Project  (SNAP).  The  algorithm  ideas  are  described  in  the  paper:  “Topic  Modeling  from  Network 
Spread,”  A.  Ray,  S.  Sanghavi  and  S.  Shakkottai.  Proceedings  of  ACM  Sigmetrics  (poster  paper), 
Austin,  TX  June  2014. 


